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Abstract. A method for estimating the cross-correlation Cxyir) of long-range 
correlated series x{t) and y{t), at varying lags r and scales n, is proposed. For fractional 
Brownian motions with Hurst exponents Hi and H2, the asymptotic expression of 
Cxyir) depends only on the lag r (wide-sense stationarity) and scales as a power of n 
with exponent Hi + H2 for r — > 0. The method is illustrated on (i) financial series, to 
show the leverage effect; (ii) genomic sequences, to estimate the correlations between 
structural parameters along the chromosomes. 
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1. Introduction and overview 

Interdependent behaviour and causality in coupled complex systems continue to attract 
considerable interest in fields as diverse as solid state science, biology, physiology, 
climatology PQEIEISIEIEIIZIIH]- Coupling and synchronization effects have been 
observed for example in cardiorespiratory interactions, in neural signals, in glacial 
variability and in Milankovitch forcing [9l [TOl [11]. In finance, the leverage effect 
quantifies the cause-effect relation between return r{t) and volatility arit + r) and 
eventually financial risk estimates [l2l[T3l[Tail5l[T6l[I7lll8l[I3[20l[ail23^ In DNA 
sequences, causal connections among structural and compositional properties such as 
intrinsic curvature, flexibility, stacking energy, nucleotide composition are sought to 
unravel the mechanisms underlying biological processes in cells |2H I2S] • 

Many issues still remain unsolved mostly due to problems with the accuracy and 
resolution of coupling estimates in long-range correlated signals. Such signals do not 
show the wide-sense-stationarity needed to yield statistically meaningful information 
when cross-correlations and cross-spectra are estimated. In [271 EH], a function Fxy{n), 
based on the detrended fluctuation analysis - a measure of autocorrelation of a series 
at different scales n - has been proposed to estimate the cross-correlation of two series 
x{t) and y(t). However, the function F^y^n) is independent of the lag r, since it is a 
straightforward generalization of the detrended fluctuation analysis, which is a positive- 
defined measure of autocorrelation for long-range correlated series. Therefore, Fxy{n) 
holds only for r = 0. Different from autocorrelation, the cross-correlation of two long- 
range correlated signals is a non-positive- defined function ofr, since the couphng could 
be delayed and have any sign. 

In this work, a method to estimate the cross-correlation function Cxyir) between 
two long-range correlated signals at different scales n and lags r is developed. The 
asymptotic expression of Cxyir) is worked out for fractional Brownian motions Bnit), H 
being the Hurst exponent, whose interest follows from their widespread use for modeling 
long-range correlated processes in different areas [21]. Finally, the method is used to 
investigate the coupling between (i) returns and volatility of the DAX stock index and 
(a) structural properties, such as deformability, stacking energy, position preference and 
propeller twist, of the Escherichia Coli chromosome. 

The proposed method operates: (i) on the integrated rather than on the increment 
series, thus yielding the cross-correlation at varying windows n, as opposed to the 
standard cross-correlation; (ii) as a sliding product of two series, thus yielding the cross- 
correlation as a function of the lag r, as opposed to the method proposed in [221 [2H] • 
The features (i) and (ii) imply higher accuracy, ra-windowed resolution while capturing 
the cross-correlation at varying lags r. 
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2. Method 

The cross-correlation Cxy{t, r) of two nonstationary stochastic processes x{t) and y{t) 
is defined as: 



C^yit, r) = - vM[y*it + v;it + r)] 



(2.1) 



where rix{t) and ri*(t + r) indicate time-dependent means of x{t) and y*{t + r), the 
symbol * indicates the complex conjugate and the brackets <> indicate the ensemble 
average over the joint domain of x{t) and y*{t + r). This relationship holds for space 
dependent sequences, as for example the chromosomes, by replacing time with space 



coordinate. Eq. (2.1) yields sound information provided the two quantities in square 



parentheses are jointly stationary and thus C^yit, r) = Cxyir) is a function only of the 
lag r. 

In this work, we propose to estimate the cross-correlation of two nonstationary 



(2.2) 



signals by choosing for rix{t) and r]*y{t + r) in Eq. (2.1), respectively the functions: 

1 " 

Xn{t) = - y^x{t - k) 
A:=0 

and 

1 " 

y*n{t + r) = -Y,y*i^ + ^-k) (2.3) 



A;=0 



2.1. Wide-sense stationarity 



The wide-sense stationarity of Eq. (2.1) can be demonstrated for fractional Brownian 
motions. By taking a;(t) = Bu^it), y{t) = BH2{t), Vxit) and ?7*(t-|-r) calculated according 
to Eqs. ([2^2p3|), C^y{t, t) writes: 



C.y{t, r) = {^[BHAt) - BnMlBUt + r) - B^^^t + r)]^ . (2.4) 
When writing x{t) = Bu^t) and y{t) = B^.^^), we assume the same underlying 



generating noise dB{t) to produce a sample of x and y. Eq. (2.4) is calculated in 



the limit of large n (calculation details are reported in the Appendix). One obtains: 



Cxy{T') 



n ^Hi,H2 



+ 



[l + T 



,1+Hi+H2 



— T 
+ (1 



H1+H2 



\1+Hi+H2 



1 + H^ + H2 

j.y+Hi+H2 _ 2f'^+Hi+H2 + (1 + fy+Hi+H2 



(2.5) 



where f = r/n is the scaled lag and Dh^ h.^ is defined in the Appendix. Eq. (2.5) is 



independent of t, since the terms in square parentheses depend only on f = r/n, and 



thus Eq. (2.1) is made wide-sense stationary. It is worthy of note that, in Eq. (2.5) 
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the coupling between Buiit) and -Bj/jl^) reduces to the sum of the exponents Hi + H2. 
Eq. (2.5), for r = 0, reduces to: 

a,(0) oc , (2.6) 

indicating that the couphng between BH^it) and BH2if) scales as the product of 
and n^^. The property of the variance of fractional Brownian motion Bnit) to scale as 
n"^^ is recovered from the Eq. (2.6) for x = y and Hi = H2 = H, i.e.: 

a.Wocn^^ . (2.7) 



Eq. (2.7) has been studied in 



3. Examples 

3.1. Financial series 

The leverage effect is a stylized fact of finance. The level of volatility is related to 
whether returns are negative or positive. Volatility rises when a stock's price drops 
and falls when the stock goes up [12j. Furthermore, the impact of negative returns 
on volatility seems much stronger than the impact of positive returns [down market 
effect) [inillZ!- To illustrate these effects, we analyze the correlation between returns 
and volatility of the DAX stock index P{t), sampled every minute from 2-Jan-1997 to 
22-Mar-2004, shown in Fig. [l] (a). The returns and volatility are defined respectively 

as: r{t) = In P{t + t') - InP(t) and arit) = ^Ef=i Ht) - Ht)^f/{T ^) . 

Fig. [1] (b) shows the returns for t' = Ih. The volatility series are shown in Figs, [l] (c,d) 

respectively for T = 300/i and T = Q60h. The Hurst exponents, calculated by the slope 



of the log-log plot of Eq. (2.7) as a function of n, are H = 0.50 (return), H = 0.77 
(volatility T = 300/i) and H = 0.80 (volatility T = 6Q0h). Fig. |2| shows the log-log 
plots of Cxx{0) for the returns (squares) and volatility with T = 660 (triangles). The 
scaling-law exhibited by the DAX series guarantees that its behaviour is a fractional 
Brownian motion. The function Cxy{0) with x = r{t) and y = axit) with T = 66O/1 
is also plotted at varying n in Fig. [2] (circles). From the slope of the log-log plot of 
Cxy{0) vs n, one obtains H = 0.65, i.e. the average between Hi and H2 as expected 



from Eq. (2.6). 



Next, the cross-correlation is considered as a function of r. The plots of Cxyir) for 
X = r{t) and y = crxit) with T = 300/i and T = 660/i are shown respectively in 
Fig. [3] (a,b) at different windows n. 

The function Cxyij) for x = r{t) and y = arit , is shown in Fig. ^c). The 
cross-correlation takes negative values at small r and reaches the minimum at about 
10-12 days. This indicates that the volatility increases with negative returns (i.e. with 
price drops). Then Cxyir) changes sign relaxing asymptotically to zero from positive 
values at large r. The positive values of Cxyir) indicate that the volatility decreases 
when the returns become positive (i.e. when price rises) and are related to the restored 
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Figure 1. DAX stock index: (a) prices; (b) returns with t' ~ Ih; (c) volatility with 
T = 300/i; (d) volatihty with T = 66O/1. 




Figure 2. Log- log plot of Cxx{0) for the DAX return (squares) and volatility 
(triangles) and of Cxy{0) with x = r{t) and y ~ (TT{t) (circles). Red lines are linear 
fits. The power-law behaviour is consistent with Eqs. ( 2.6|2.7 1. 
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Figure 3. Cross-correlation C^j,(r) with x ~ r{t) and y = crT(i) with (a) T — 300/i 
and (b) T — 660ft.; (c) with x = r{t) and y = o't(^)^ with T = 660ft.. n ranges from 
100 to 500 with step 100. 



1 




Figure 4. Plot of the function Cxy{T)n (^1+^2) ^jth x — r{t) and y = crrit) with 
T = 300ft. iJi = 0.5 and H2 = 0.77 n ranges from 100 to 500 with step 100. One 
can note that the five curves collapse, within the numerical errors of the parameters 
entering the auto- and cross-corerlation functions. This is in accord with the invariance 
of the product Cxy{T)n^''^^^^^^ with the window n. 
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Figure 5. Leverage function with volatility windows T = lOO/i, 300/i, 660/i, lOOO/i. 
The value of n is 400 equal for all the curves. 



equilibrium within the market {positive rebound days). It is worthy of remark that 
the (positive) maximum of the cross-correlation is always smaller than the (negative) 
minimum. This is the stylized fact known as down market effect. A relevant feature 
exhibited by the curves in Figs. |3] (a-c) is that the zeroes and the extremes of Cxy{T) 
occur at the same values of r, which is consistent with wide-sense-stationarity for all 
the values of n. A further check of wide sense stationarity is provided by the plot of 
the function C^.y(r)ra-(^i+^2)_ pjg^ |4| q^^(^^-^^~{Hi+H2) jg plotted with x = r{t) and 

y = (TT{t) with T = 300/i, Hi = 0.5 and H2 = 0.77, n ranges from 100 to 500 with 
step 100. One can note that the five curves collapse in accord with the invariance of the 
product C^y(r)n-(^i+^2) ^j^h 



n. 



In Fig. |5| the leverage correlation function C{t) = {crT{t + rYrit)) / {r{tY)'^ 
according to the definition put forward in is plotted for different volatility windows 



T. The function {aTit + rYrit)) has been calculated by means of Eq.(2.1 ). The negative 
values of cross-correlation (at smaller r) and the following values {positive rebound days) 
at larger r can be clearly observed for several volatility windows T. The function C{t) 
for the DAX stock index, estimated by means of the standard cross-correlation function, 
is shown in Figs. 1,2 of Ref. [20] • By comparing the curves shown in Fig. [s] to those 
of Ref. [20], one can note the higher resolution related to the possibility to detect the 
correlation at smaller lags (note the r unit is hours, while in Ref.[TSl [THl 1201 121] is 
days) and at varying windows n, implying the possibility to estimate the degree of 
cross-correlation at different frequencies. As a final remark, we mention that the cross 
correlation function between a fractional Brownian motion and its own width can be 
computed analytically in the large n limit, following the derivation in the Appendix 
for two general fBm's. The width of a fBm is one possible definition for the volatility, 
therefore the derivation in the Appendix provides a straightforward estimate of the 
leverage function. 
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3.2. Genomic Sequences 

Several studies are being addressed to quantify cross-correlations among nucleotide 
position, intrinsic curvature and flexibility of the DNA helix, that may ultimately shed 
light on biological processes, such as protein targeting and transcriptional regulation 
[211 l25l |26] . One problem to overcome is the comparison of DNA fragments with di- 
and trinucleotide scales, hence the need of using high-precision numerical techniques. 
We consider deformability, stacking energy, propeller twist and position preference 
sequences of the Escherichia Coli chromosome. The sequences, with details about 
the methods used to synthetize/measure the structural properties, are available at the 
CBS database - Center for Biological Sequence Analysis of the Technical University 
of Denmark (http://www.cbs.dtu.dk/services/genomeAtlas/). In order to apply the 
proposed method, the average value is subtracted from the data, that are subsequently 
integrated to obtain the paths shown in Fig. |6| The series are 49389196p long and have 
Hurst exponents: H = 0.70 (deformability), H = 0.65 (position preference), H = 0.73 
(stacking energy), if = 0.70 (propeller twist). 

The cross-correlation functions Cxyir) between deformability, stacking energy, 
propeller twist and position preference are shown in Fig. [t] (a-e). There is in 
general a remarkable cross-correlation along the DNA chain indicating the existence 
of interrelated patches of the structural and compositional parameters. The high 
correlation level between DNA flexibility measures and protein complexes indicates 
that the conformation adopted by the DNA bound to a protein depends on the inherent 
structural features of the DNA. It is worthy to remark that the present method provides 
the dependence of the coupling along the DNA chain rather than simply the values of the 
linear correlation coefficient r. In Table 4 of Ref. |26] one can find the following values 
of the correlation obtained by either numerical analysis or experimental measurements 
(in parentheses) over DNA fragments : (a) r = —0.80 (—0.86); (b) r = 0.06 (0.00); 
(c) r = -0.15 (-0.22); (d) r = -0.74 (-0.82); (e) r = -0.80 (-0.87). Moreover, also 
for the genomic sequences the function Cxy{T)n^^^'^^^^^ is independent of n within the 
numerical errors of the parameters entering the auto- and cross-correlation functions. 
In Fig. [sj Cxy{T~)n^^^^^^'^'^ is shown for x{t) the deformability, y{t) the stacking energy. 
Hi = 0.7 and H2 = 0.73. n ranges from 100 to 500 with step 100. 



4. Conclusions 



A high-resolution, lag-dependent non-parametric technique based on Eqs. (2.1 2.3) to 
measure cross-correlation in long range-correlated series has been developed. The 
technique has been implemented on (z) financial returns and volatilities and (ii) 
structural properties of genomic sequences [35]. The results clearly show the existence 
of coupling regimes characterized by positive-negative feedback between the systems 
at different lags r and windows n. We point out that - in principle - other methods 
might be generalized in order to yield estimates of the cross-correlation between long- 
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Figure 6. Structural sequences of the Escherichia Coli cliromosome. 



range correlated series at varying r and n. However, techniques operating over the 
series by means of a box division, such as DFA and R/S method, are a-priori excluded. 
The box division causes discontinuities in the sliding product of the two series at the 
extremes of each box, and ultimately incorrect estimates of the cross-correlation. The 



present method is not affected by this drawback, since Eqs. (2.1 2.3) do not require a 
box division. 



Appendix A. Details of the calculation: 



Let us start from Eq. (2.4): 



C.y{t, r) = (^[BH,{t) - BHM[B*H,{t + r)- B]j^{t + r)] 
that, after multiplying the terms in parentheses, becomes: 



(A.l) 



t + T) = {^[BHAt)B*H,{t + r) - BHAt)B*H,{t + r 
- BHAt)B*H,{t + r) + BHAt)B*H,{t + r)] 



(A.2) 
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Figure 7. Cross-correlation Cxyir) between (a) deformability and stacking energy; 
(b) position preference and deformability (c) propeller twist and position preference; 
(d) propeller twist and stacking energy; (e) propeller twist and deformability. n ranges 
from 100 to 500 with step 100. 
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Figure 8. Plot of the function Cxy{T)n~''^^^^^^ with x{t) the deformability, y{t) 
the stacking energy, Hi = 0.7 and H2 = 0.73. n ranges from 100 to 500 with step 
100. One can note that the five curves collapse, within the numerical errors of the 
parameters entering the auto- and cross-corerlation functions. This is in in accord 
with the invariance of the product Cxy{T)n~^^'^^^'^^ with the window n. 
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In general, the moving average may be referred to any point of the moving window, 



n—9n 



a feature expressed by replacing Eqs. (2.2,2.3) with 

n—dn 

Xn(t) = - x{t~k) yn(t + r) 

k=~On k=—9n 

with < ^ < 1. In the limit of n — oo, the sums can be replaced by integrals, so that: 



1 " 

- J2 y{t + r-k) (A.3) 



x{t) 



x(i — k) 



y{t + r) 



y{i + f~k) 



(A.4) 



where t = ni, r = nf, k = nk. For the sake of simplicity, the analytical derivation 
will be done by using the harmonizable representation of the fractional Brownian 
motion [361 EH [39] : 



+00 ^it^ _ 



-dBiO 



(A.5) 



where dB[^) is a representation of dB{t) in the ^ domain. In the following we will 



consider the case of t > and t + r > 0. By using Eq. (A.5), the cross-correlation of 
two fbms Bh^ (t) and Bh2 {t + t) can be written as: 



+00 git^ _ 



TdBiO 



+00 ^-i(t+T)r] _ ^ 



dB{ri) 



(A.6) 



Since dB is Gaussian, the following property holds for any f,gEL'^ 



+00 



+ 00 



g{rj)dB{rj) 



+00 



fiOg^OdC (A.7) 



By using Eq. (A.7), after some algebra Eq. (A.6) writes: 

{BHAt)B'k,it + r)) = Dh„h, + (t + r)^^+^^ 



\Hi+H2 



(A.8) 



where Dh^,H2 is a normalization factor which depends on Hi and H2. In the 
harmonizable representation of fBm, D^^ h^ takes the following form 



D 



HuH2 



D 



H1+H2 



2 

cos 

71 



{Hi + H2)tx 



T[-{Hi + H2)] 



(A.9) 



normalized such that Dh^,H2 = 1 when Hi = H2 = ^. Different representations of the 
fBm lead to different values of the coefficient Dh^ h2 [291 HQ]. 



Eq. (A.8) can be used to calculate each of the four terms in the right hand side 



of Eq. (A. 2). The mean value of each term in Eq. (A. 2) is obtained from the general 



formula in Eq. (A.8); thus, substituting the right hand side of Eq. (A.8) and Eq. (A.4) 



into each term in Eq. (A. 2) we obtain: 



r, 9) = [(^1+^^ + (t + f)^^+^^ - 



\Hi+H2 



IH1+H2 
1-0 

k=-e 



1-6 



\t - h + f 



H1+H2 



dh 



\i - h\ 



H1+H2 



dh 



t - k 



H1+H2 



dk + (£ + f) 



H1+H2 



\i+k\ 



H1+H2 



dk 



k=-e 
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+ 



1-6 



\t-k 



H1+H2 



k=-e 



dk 



h~k\ 



1-6 



\t-h 



\Hi+H2 



dh 



H1+H2 



dh dk 



k=-e 



(A.IO) 



where each term in round parentheses corresponds to each of the four terms in Eq. ( A.2 ). 
Summing the terms in Eq. (A.IO), one can notice that time t cancels out, thus one finally 
obtains: 

.1-6* 

\T-h\ 



-i-e 



+ 



It + A; 



H1+H2 



dk — 



1-6 



\t -h + k 



^'+"^dhdk 



(A.li; 



Consistently with the large n limit, we take t < n, namely f < 1. The integral 



(A. 11) admits four different solutions, depending on the values taken by the parameters 
f and 9. Let us consider each case separately. 



Case 1: f < 6 and f -\- 6 < 1 



C,y{f,e) = n''^+''WH„H2 -r 
^ (1 + f - 



H1+H2 



+ i9-T 



(1 _ f^2+Hi+H2 _ 2f^+Hi+H2 + (1 + fy+Hi+H2 
^1+Hi+H2 (^l _ f _ QY+H1+H2 -I- -I- QY+H1+H2 



I+H1+H2 



(A.12) 



Case 2: f < 6 and t + 6 > 1 



H2 



T 



H1+H2 



'I _ f\'i+Hi+H2 _ 2r'^+Hi+H2 + (1 + f\2+H-i+H2 



+ 



{1 + H, + H2)i2 + H, + H2) 
l + r- eY+^^+^^ + {e- rY+Hi+H2 _ + ^ _ iY+h,+H2 + + qY+h,+H2 



1 + Hi + H2 



(A.13) 



Case 3: T > 6 and t -\- 6 < 1 



n 



H1+H2 n 

^Hi,H2 



T 



H1+H2 



+ 



:i + f 



^l+Hi+H2 



r 



(1 _ j^-j2+Hi+H2 _ 2f2+Hi+H2 + (1 + fy+Hi+H2 

{1 + Hi + H2){2 + Hi + H2) 

Y+H1+H2 _|_ n _ _ Q\l+Hi+H2 _)_ (f + QY+Hi+H2^ 



1 + Hi + H2 

It is easy to see that this case includes the Eq. (2.5) treated in the paper. 



(A. 14) 
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Case 4^ f > 6 and t + 6 > 1 



— r 



^^^^^ (1 - rf+H.+H2 _ 2f^+H.+H2 ^ (1 ^ fY+H,+H2 



+ 
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{l + Hi + H2){2 + Hi + H2) 



1 + iJl + 
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